9 research outputs found

    Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)

    Get PDF
    This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach

    Statistical Density Prediction in Traffic Networks

    No full text
    Recently, modern tracking methods started to allow capturing the position of massive numbers of moving objects. Given this information, it is possible to analyze and predict the traffic density in a network which offers valuable information for traffic control, congestion prediction and prevention. In this paper, we propose a novel statistical approach to predict the density on any edge of such a network at some time in the future. Our method is based on short-time observations of the traffic history. Therefore, knowing the destination of each traveling individual is not required. Instead, we assume that the individuals will act rationally and choose the shortest path from their starting points to their destinations. Based on this assumption, we introduce a statistical approach to describe the likelihood of any given individual in the network to be located at a certain position at a certain time. Since determining this likelihood is quite expensive when done in a straightforward way, we propose an efficient method to speed up the prediction which is based on a suffix-tree. In our experiments, we show the capability of our approach to make useful predictions about the traffic density and illustrate the efficiency of our new algorithm when calculating these predictions

    Probabilistic frequent itemset mining in uncertain databases

    No full text
    Probabilistic frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied to standard “certain” transaction databases. The consideration of existential uncertainty of item(sets), indicating the probability that an item(set) occurs in a transaction, makes traditional techniques inapplicable. In this paper, we introduce new probabilistic formulations of frequent itemsets based on possible world semantics. In this probabilistic context, an itemset X is called frequent if the probability that X occurs in at least minSup transactions is above a given threshold τ. To the best of our knowledge, this is the first approach addressing this problem under possible worlds semantics. In consideration of the probabilistic formulations, we present a framework which is able to solve the Probabilistic Frequent Itemset Mining (PFIM) problem efficiently. An extensive experimental evaluation investigates the impact of our proposed techniques and shows that our approach is orders of magnitude faster than straight-forward approaches

    Scalable Probabilistic Similarity Ranking in Uncertain Databases

    No full text

    Similarity Search on Uncertain Spatio-temporal Data

    No full text
    In this work, we address the problem of similarity search in a database of uncertain spatio-temporal objects. Each object is defined by a set of observations ((time, location)-tuples) and a Markov chain which describes the objects uncertain motion in space and time. To model similarity - which is an important building block for many applications such as identifying frequent motion patterns or trajectory clustering - we employ the well-known Longest Common Subsequence (LCSS) measure, which becomes a distribution on uncertain spatio-temporal data (ULCSS). We show how the aligned version (without time shifting) of the ULCSS can be exactly computed in PTIME, which is also verified by extensive experiments
    corecore